HUMAN NEUROSCIENCE 



The neural control of singing 

Jean Mary Za rate* 

Department of Psychology, New York University, New York, NY, USA 
Edited by: 

Eckart Altenmuller, University of 
Music and Drama Hannover, 
Germany 

Reviewed by: 

Boris Kleber, McGill University, 
Canada 

Hermann Ackermann, University of 
Tuebingen, Germany 

*Correspondence: 

Jean Mary Zarate, Department of 
Psychology, New York University, 
6 Washington Place, Room 275-276, 
New York, NY 10003, USA 
e-mail: jean.m.zarate@nyu.edu 



Keywords: auditory processing, 
somatosensory, vocal pitch 

Most of the literature on sensory-motor control in music 
production and training-induced plasticity focuses on trained 
instrumental musicians or learning paradigms with musical 
instruments (e.g., learning to play short piano melodies, etc.). 
Singing, however, provides a unique opportunity to examine 
sensory-motor processes during musical production, since the 
instrument is already contained within the body; there is no need 
to create artificial instruments to assess motor control mecha- 
nisms with neuroimaging or any other experimental approach. 
Moreover, the adult vocal apparatus is highly trained to pro- 
duce nuanced utterances in both song and speech. Across their 
lifetime, healthy non-musicians have sung (or have attempted 
to sing) a full repertoire of songs in socially and culturally spe- 
cific settings, ("Happy Birthday," their national anthem, etc.). 
Additionally, healthy individuals can control their vocal pitch 
and/or output intensity to indicate the intent of a sentence (e.g., 
declarative statements vs. questions vs. commands), set the emo- 
tional context for a conversation (e.g., happiness, anger, sadness), 
or in tonal languages, distinguish between words and their mean- 
ings. Singers, on the other hand, undergo many years of extensive 
sensory-motor training and practice to exert much finer vocal 
control during more difficult tasks, such as singing fast vocal runs 
(e.g., melismata, melodic embellishments, etc.) or maintaining a 
melodic passage as someone else simultaneously sings a harmonic 
line. Therefore, using singing tasks to test groups with different 
levels of singing experience is a rare opportunity to determine 
how musical experience specifically enhances sensory- motor con- 
trol of this particular instrument, beyond the remarkable feats 
it already can perform. However, the mechanisms by which the 
vocal instrument is precisely controlled for singing are highly 
complex and thus require multiple networks for vocal motor 
control and sensory feedback processing. 
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SENSORY-MOTOR CONTROL OF VOCALIZATION 
SENSORY-MOTOR CONTROL OBSERVED FROM THE VOCAL TRACT 

When air passes through the glottis (opening of the larynx) and 
causes the vocal folds surrounding the glottis to vibrate at a 
particular rate, the resulting vibration rate determines the fun- 
damental frequency (i.e., perceived pitch) of the voice (Sundberg, 
1987). Different intrinsic and extrinsic laryngeal muscles inter- 
act to regulate fundamental frequency by altering the length 
of the vocal folds, thus changing the rate of vocal-fold vibra- 
tion (Hirano et al., 1969; Sundberg, 1987). The precise control 
of laryngeal muscles is maintained in part by laryngeal reflexo- 
genic control systems, in which receptors within the larynx adjust 
muscular contractions during perturbations. For instance, during 
vocalization, the uneven airflow passing through the glottis stim- 
ulates the myotatic mechanoreceptors in the intrinsic laryngeal 
muscles; these stretch-sensitive receptors initiate reflexive mus- 
cular adjustments to ensure that the vocal folds remain at the 
intended length and tension and therefore maintain a steady vocal 
pitch (Wyke, 1974). Additional reflexogenic systems work in con- 
cert with the intrinsic laryngeal reflexogenic system to ensure a 
stable vocalization (Wyke, 1974). Vocalization also involves the 
coordination of many other muscles, including the diaphragm 
and abdominal/thoracic muscles to provide airflow and regulate 
vocal output intensity, and articulatory muscles (e.g., lip, jaw, and 
tongue muscles, Hardcastle, 1976; Sundberg, 1987). The articula- 
tory muscles contain somatosensory receptors that play a role in 
generating different vocal-tract configurations, which shape the 
formant frequencies that contribute toward vowel formation and 
vocal timbre (Sundberg, 1987; Jiirgens, 2002; Perkell, 2012). 

Similar to the somatosensory contribution to reflexogenic 
vocal control systems, auditory feedback also plays a role in 
reflex-like adjustments of ongoing vocal motor control. For 



Singing provides a unique opportunity to examine music performance — the musical 
instrument is contained wholly within the body, thus eliminating the need for creating 
artificial instruments or tasks in neuroimaging experiments. Here, more than two decades 
of voice and singing research will be reviewed to give an overview of the sensory-motor 
control of the singing voice, starting from the vocal tract and leading up to the brain regions 
involved in singing. Additionally, to demonstrate how sensory feedback is integrated with 
vocal motor control, recent functional magnetic resonance imaging (fMRI) research on 
somatosensory and auditory feedback processing during singing will be presented. The 
relationship between the brain and singing behavior will be explored also by examining: 
(1) neuroplasticity as a function of various lengths and types of training, (2) vocal amusia 
due to a compromised singing network, and (3) singing performance in individuals 
with congenital amusia. Finally, the auditory-motor control network for singing will be 
considered alongside dual-stream models of auditory processing in music and speech 
to refine both these theoretical models and the singing network itself. 
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FIGURE 1 | Neural networks of vocal motor control (central column), 
somatosensory (left) and auditory feedback processing (right), and 
hypothesized regions of sensory-motor control of voice [modified from 
a model proposed by Jiirgens (2009)]. The vocal motor control hierarchy 
starts with the generation of complete vocal patterns from the reticular 
formation and phonatory motoneurons (white boxes), and then the next 
highest level of control (green boxes) stems from the anterior cingulate 
cortex (ACC) and periaqueductal gray (PAG), which can initiate and 
emotionally motivate vocal responses. The highest level of vocal control 
comes from the primary motor cortex (M1 , blue box; its modulatory brain 
regions are not depicted), which is responsible for producing learned 
vocalizations (i.e., speech and song). Somatosensory feedback (dotted 
arrow) from various receptors distributed throughout the vocal tract is 
processed in the ascending somatosensory pathway (yellow boxes, left; 
black slanted lines indicate that only selected regions of this pathway are 
shown) and transmitted to the primary and secondary somatosensory 
cortex (S1, S2). Auditory feedback (dashed arrow) from the vocalization is 
processed by the ascending auditory pathway and auditory cortical regions 
(orange boxes, right). Potential neural regions that integrate sensory 
feedback processing with vocal motor control are indicated with 
red-outlined boxes, and their shared connections are represented by red 
arrows: (A) the PAG, (B) ACC, and (C) the insula (in purple, classified as a 
higher-order associative area). 



instance, a slight decrease in auditory feedback amplitude elic- 
its a quick increase in vocal output amplitude, which is known 
as the Lombard reflex (Lombard, 1911). During speech pro- 
duction, when the first formant frequency is shifted so that a 
produced vowel (e.g., /e/) sounds like a different one (e.g., /ae/), 
the vocal motor system immediately compensates for the for- 
mant shift (Houde and Jordan, 1998, 2002; Purcell and Munhall, 
2006a,b). Arguably, the most relevant auditory-vocal motor cor- 
rection for singers deals with vocal pitch. When the pitch of 
auditory feedback is shifted up or down as participants vocal- 
ize for a few seconds (either at a comfortable pitch or to match 
a target pitch), investigators have observed pitch-shift responses, 
during which vocal pitch is adjusted quickly in the opposite direc- 
tion of the feedback shift (Anstis and Cavanagh, 1979; Burnett 
et al., 1998; Larson, 1998; Hain et al, 2000; Jones and Munhall, 
2000, 2005; Larson et al, 2000; Burnett and Larson, 2002; Liu 
and Larson, 2007; Jones and Keough, 2008). These pitch-shift 
responses often have two components: (1) an early pitch-shift 
response of 25-50 cents (irrespective of the pitch-shift magni- 
tude) that occurs 100-150 ms after the pitch shift; and (2) a 
late pitch-shift response with a latency of 250-600 ms, whose 
magnitude and direction can be under voluntary control, if lis- 
teners are instructed to make a specific response (e.g., change 
pitch to either oppose or follow the pitch shift, etc., Burnett 
et al, 1998; Larson, 1998; Hain et al, 2000). Interestingly, pro- 
longed exposure to feedback that is incrementally pitch-shifted 
over numerous trials can produce aftereffects in which intended 
vocal pitch and vocal output are mismatched, such that vocal 
pitch is automatically adjusted even when auditory feedback is 
returned to normal (Jones and Munhall, 2000, 2005; Jones and 
Keough, 2008). 

NEURAL NETWORKS GOVERNING SENSORY-MOTOR CONTROL OF 
VOCALIZATION 

Brain regions involved in vocal motor control 

Multiple neural networks are required for precise control of the 
"phonatory" muscles mentioned above. The reticular formation 
of the pons and medulla has direct connections to the motoneu- 
rons for all phonatory muscles (Figure 1, white boxes, Thorns 
and Jiirgens, 1987), and thus may coordinate phonatory mus- 
cle groups to generate complete vocal patterns (Jiirgens and 
Hage, 2007). This region receives excitatory input from two dis- 
tinct neural pathways of vocal control (Figure 1; Jiirgens, 2009; 
Owren et al., 2011). The first vocal control pathway (Figure 1, 
green boxes) contains the anterior cingulate cortex (ACC) and 
the midbrain periaqueductal gray (PAG), both of which produce 
vocalizations when stimulated electrically or pharmacologically 
(Miiller-Preuss and Jiirgens, 1976; Miiller-Preuss et al., 1980; Suga 
and Yajima, 1988; Dujardin and Jiirgens, 2005). The second neu- 
ral pathway includes the primary motor cortex (Ml, Figure 1, 
blue box) and two subcortical loops — comprised of putamen, 
globus pallidus, pontine gray, and cerebellum — that modulate 
vocal motor commands from Ml and subsequently send modi- 
fied motor programs via the ventrolateral thalamus back to Ml; 
electrical stimulation of the ventral part of Ml elicits vocaliza- 
tions, as well as individual movements of the jaw, tongue, and lips 
(Penfield and Rasmussen, 1950). 



In humans, these networks form a tripartite hierarchy of vocal 
motor control (Figure 1, center column, Simonyan and Horwitz, 
2011): (1) the reticular formation constitutes the lowest level at 
which complete vocal patterns are generated; (2) the next level is 
comprised of the ACC and the PAG, which are attributed with the 
voluntary initiation and emotional/motivational control of vocal- 
izations (Jiirgens, 2002, 2009); and (3) the highest level of vocal 
control occurs in Ml (and its modulatory brain regions), which 
is associated with the generation of learned vocalizations, such 
as speech and song (Jiirgens, 2002, 2009). Importantly, this func- 
tional distinction of Ml is based on humans' unique possession of 
direct connections between the phonatory region of Ml (i.e., the 
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ventral portion) and the motoneurons of phonatory muscles (see 
Figure 1); bilateral lesions to this Ml region destroys the ability 
to speak and sing (Jurgens, 2009), while innate vocalizations (e.g., 
shrieking, crying, etc.) that may be controlled by the ACC and 
PAG are left intact. In contrast, damage to the modulatory brain 
regions associated with Ml (e.g., putamen, globus pallidus, pon- 
tine gray, and cerebellum) can result in speech disorders such as 
stuttering and dysarthria (Ackermann et al., 1992; Jurgens, 2002; 
Aim, 2004). Lesions in the second level of vocal control may lead 
to mutism (attributed to PAG damage, Esposito et al., 1999) or 
loss of emotional/motivational intonation in speech (following 
damage to the ACC, Simonyan and Horwitz, 201 1). Importantly, 
the functional organization of vocal motor control in humans 
is concurrently hierarchical and parallel, since damage to brain 
regions within the second or third levels does not abolish all 
vocalizations. 

Neural processing of somatosensory feedback 

Various somatosensory receptors transmit feedback about the 
current state of the vocal motor system (e.g., placement of articu- 
lators, respiration, etc.) via the glossopharyngeal and vagus nerves 
and the ascending somatosensory pathway, which includes the 
nuclei gracilis, solitarius, and spinalis nervi trigemini and the 
medial lemniscus in the medulla, and the ventral posterome- 
dial nucleus in the thalamus (Jurgens and Kirzinger, 1985; Willis, 
1986). The thalamus sends somatosensory information to pri- 
mary and secondary somatosensory cortex (SI and S2), as well 
as the insula (Jones and Powell, 1970; Augustine, 1996; Jurgens, 
2002; Ackermann and Riecker, 2004, 2010). More specifically, 
the ventral portion of the primary somatosensory cortex (SI) — 
posteriorly adjacent to the Ml phonatory area that governs vocal- 
izations and individual movements of the articulators (Penfield 
and Rasmussen, 1950) — processes somatosensory information 
about articulatory movements (Grabski et al., 2012), while the 
anterior portion of the insula is recruited particularly during 
overt vocalizations (compared to covert speech and song, Riecker 
et al, 2000) and may contribute to voluntarily controlled respi- 
ration during vocalizations in general (Ackermann and Riecker, 
2010). 

Neural processing of auditory feedback during singing 

As each sung note reaches a singer's ear as auditory feedback, 
each of the different frequencies within that particular vocal 
pitch are transduced by the organ of Corti on the basilar mem- 
brane of the cochlea (Hudspeth, 2000). The frequency charac- 
teristics that are required to perceive the pitch are transmitted 
and/or processed along different parts of the ascending auditory 
pathway — comprised of the cochlear nucleus, lateral lemniscus, 
inferior colliculus, and the medial geniculate nucleus of the tha- 
lamus (Griffiths et al., 2001) — before the extracted frequencies 
(and many other attributes of sounds) are further processed in 
primary and secondary auditory cortex within Heschl's gyrus. 
In particular, pitch information may be processed specifically 
by a (rightward lateralized) pitch-sensitive area located in lateral 
Heschl's gyrus, reported to be involved in conscious pitch per- 
ception (Griffiths, 2003; Bendor and Wang, 2006). This region 
may also be involved in organizing pitches in a hierarchical 



fashion, since patients with lesions in this region displayed 
much higher discrimination thresholds than controls when asked 
to indicate the direction of pitch change between two notes 
(Johnsrude et al, 2000). Processing pitch changes or melodic 
phrases within a sung passage recruits additional auditory cor- 
tical regions outside of Heschl's gyrus, including regions in the 
right superior temporal gyrus (STG), planum polare, and planum 
temporale (Zatorre et al., 1994; Patterson et al, 2002; Hyde 
et al., 2008). When pitch comparisons are performed within a 
sequence of tones or short melodies, increased activity is observed 
within right auditory and frontal cortical regions presumably 
during tonal working memory processes, compared to passive 
melody perception (Zatorre et al., 1994). Melodic phrase com- 
parisons in the same key, which may be done to ensure correct 
melodic reproduction, engages extensive activity within several 
auditory cortical regions along bilateral STG, whereas melodic 
phrase comparisons across a pitch transposition (i.e., a key 
change) engages additional activity from the intraparietal sulcus 
(IPS, Foster and Zatorre, 2010). 

Aside from providing details about vocal pitch, auditory feed- 
back can also provide information about vocal timbre, which is 
argued to be processed specifically along the superior temporal 
sulcus (STS, Belin et al, 2000). Kriegstein and Giraud (2004) dis- 
covered three functionally distinct regions along the STS. The 
anterior STS is associated with familiar voice recognition, while 
the mid/anterior STS preferentially responds to the spectral char- 
acteristics of voices. The posterior STS (pSTS), which is recruited 
during recognition of unfamiliar voices, may be involved in ana- 
lyzing spectral details (or the changes therein) of voices over 
time (Kriegstein and Giraud, 2004; Warren et al, 2006). Given 
that the pSTS is also recruited in response to presentation of 
frequency-modulated sweeps of pure tones (Poeppel et al, 2004) 
and phonological processing (Hickok and Poeppel, 2007), this 
region may be involved generally in processing spectrotempo- 
ral fluctuations in sound, including notable changes in auditory 
feedback. 

Potential substrates for integrating sensory feedback with vocal 
motor control 

The constituents of the vocal motor network associated with 
voluntary initiation and emotional/motivational control of 
vocalizations — the PAG and ACC — receive both somatosensory 
and auditory input, and thus form two potential substrates for 
sensory-motor control of vocalization (Figure 1, red-outlined 
boxes and arrows). The PAG (Figure 1A) receives somatosensory 
input via afferent projections from the nucleus gracilis (impli- 
cated in respiratory control, Hannig and Jurgens, 2006) and 
nuclei solitarius and spinalis nervi trigemini (kinesthetic and pro- 
prioceptive information, Jurgens and Kirzinger, 1985; Yoshida 
et al., 2000), as well as auditory information from the inferior 
colliculus and lateral lemniscus (Dujardin and Jurgens, 2005), 
all of which may facilitate initiating vocalizations in response 
to external stimuli or adjusting vocalizations based on sensory 
feedback. For example, when connections to the cerebrum are 
severed, the Lombard reflex is preserved during PAG-induced 
vocalizations coupled with auditory masking, suggesting that 
the PAG may govern auditory-motor control during involuntary 
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auditory-vocal reflexes (e.g., Lombard reflex, formant- and pitch- 
shift responses) without additional control from cortical regions 
(Nonaka et al, 1997). The ACC (Figure IB) directly receives 
somatosensory input from S2 and auditory input from auditory 
cortical regions along the STG and STS (Jiirgens, 1983; Barbas 
et al, 1999). This region also receives these types of sensory input 
indirectly from SI and auditory association areas via the insula 
(Mesulam and Mufson, 1982; Augustine, 1996). Since the insula 
is a gateway of both somatosensory and auditory information 
for the ACC, this region itself may provide another substrate for 
sensory-motor control of vocalization (Figure 1C, purple box). 
In particular, the anterior insula, whose cytoarchitecture and 
projections classify it as an association area that integrates dif- 
ferent modalities (e.g., auditory, visual, somatosensory, motor, 
etc., Rivier and Clarke, 1997; Lewis et al., 2000; Bamiou et al., 
2003; Ackermann and Riecker, 2004), is engaged specifically dur- 
ing voiced speech and song, relative to covert or internal versions 
(Riecker et al, 2000; but see Hillis et al., 2004; Ackermann and 
Riecker, 2010 for conflicting clinical evidence of the insula's role 
in speech production). 

Neuroimaging evidence: a general functional network for human 
vocalization 

Neuroimaging studies from the past two decades have confirmed 
that many regions within vocal motor and sensory networks are 
recruited during various overt speech and song tasks, including: 
word or letter generation (Paus et al., 1993); syllable repetition 
(Riecker et al., 2005); singing a note repeatedly (Perry et al., 
1999), in a sustained fashion (Zarate and Zatorre, 2008), or while 
changing vowels in particular rhythms (Jungblut et al., 2012); 
repeating syllables, spoken words, and sung or hummed melodies 
(Ozdemir et al, 2006); humming, speaking, or singing lyrics of 
a well-known song (Formby et al., 1989; Jeffries et al, 2003); 
reciting the months of the year or singing a familiar melody 
(Riecker et al., 2000); telling a story (Schulz et al., 2005); impro- 
vising word phrases, melodies, or harmonies (Brown et al., 2004, 
2006); spontaneous and synchronized speaking and singing (Saito 
et al, 2006); and singing an Italian aria (Kleber et al, 2007). 
Summarized from the neuroimaging evidence above, a general 
functional network for human vocalization (including speech and 
song) is comprised of the brain regions reviewed in the preced- 
ing sections: Ml, ACC, basal ganglia, thalamus, and cerebellum 
for vocal motor control; SI and S2 for somatosensory feedback 
processing; bilateral auditory cortical regions (primary auditory 
cortex and a pitch-sensitive region within Heschl's gyrus, various 
portions of STG and STS) for auditory feedback processing; and 
the insula presumably during multimodal processing of sensory 
feedback. In addition, premotor and parietal areas are recruited 
during human vocalization, and their functional roles will be 
further discussed below. 

Until this point, both speech and song studies have been 
included to outline the brain regions associated with general 
vocal control in humans, since speaking and singing employ com- 
mon mechanisms involved in vocal production. Moving forward, 
we will focus more on singing studies to examine how musical 
training modulates the general functional network for human 
vocalization as it is used for singing. 



TRAINING EFFECTS ON THE SENSORY-MOTOR CONTROL OF 
SINGING 

VOCAL TRAINING EFFECTS ON THE NEURAL CORRELATES OF 
SENSORY-MOTOR CONTROL OF SINGING 

In general, due to their extensive auditory-motor training and 
experience, musicians excel in various auditory and motor tasks. 
For instance, previous studies report that musicians perform 
better at pitch, timbre, and voice discrimination tasks than non- 
musicians (Kishon-Rabin et al, 2001; Tervaniemi et al., 2005; 
Chartrand and Belin, 2006; Micheyl et al, 2006). In addition 
to possessing better auditory discrimination skills than non- 
musicians, musicians also display more precise control over the 
vocal apparatus in the absence of proper auditory feedback. For 
example, trained singers sang more accurately with masked audi- 
tory feedback than non-musicians (Schultz-Coulton, 1978), yet 
one study reported the reverse (Watts et al., 2003). However, 
Watts' group of singers may have had less vocal training than 
the singers in Schultz-Coulton's study; Watts suggested that dur- 
ing the earlier stages of vocal training, more emphasis is placed 
on monitoring auditory feedback for vocal accuracy (Watts et al., 
2003), which may account for their recruited singers' greater vocal 
inaccuracy with masked feedback compared to non-musicians. 
In fact, in a longitudinal study with trained singers performing 
various slow and fast singing tasks, vocal accuracy was not dif- 
ferentially affected by masked auditory feedback neither before 
nor after 3 years of vocal training (Miirbe et al., 2004), which 
suggests that auditory feedback may not play a crucial role in 
vocal accuracy after extensive vocal training. Nevertheless, vocal 
accuracy did improve during slow singing tasks with masked feed- 
back after vocal training, which Miirbe et al. (2004) attributed 
to training-enhanced "neuromuscular memory of pitch" (p. 240). 
This implies that trained singers may rely more on somatosensory 
feedback to make sure that notes are produced properly, since they 
can still sing accurately for some time after losing their hearing 
(Wyke, 1974). Indeed, a functional magnetic resonance imag- 
ing (fMRI) singing study demonstrated that both vocal students 
(enrolled in a performance program) and professional opera 
singers recruited more activity within SI and somatosensory asso- 
ciation cortex than amateur singers, and moreover, the amount 
of singing practice positively correlated with the activity in these 
regions (Kleber et al, 2010). In a more recent fMRI study, Kleber 
et al. (2013) effectively reduced the amount of somatosensory 
feedback available by applying a topical anesthetic to the vocal 
folds just prior to singing in the MR scanner. The investigators 
determined that under vocal-fold anesthesia, singers displayed 
reduced activity in the right anterior insula than non-musicians, 
who had enhanced insular activity with anesthesia. Additionally, 
this region exhibited decreased functional connectivity to Ml, 
SI, and auditory cortex in singers under topical anesthesia, while 
functional connectivity increased between these regions in non- 
musicians with anesthetized vocal folds. Notably, singers still sang 
more accurately under anesthesia than non-musicians, despite the 
observed reduction of insular activity and functional connectivity. 
Both of Kleber 's experiments provide evidence that: (1) singers 
may rely more heavily on somatosensory feedback as a function 
of vocal training and practice, and (2) singers, perhaps by virtue 
of their training, can regulate activity within the right anterior 
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insula to "disengage" or ignore somatosensory feedback when it 
is perturbed or deemed unreliable and thus may significantly alter 
their singing performance. 

Similar to the somatosensory feedback perturbation induced 
in Kleber's recent study, Zarate and colleagues (2008, 2010b) 
utilized pitch-shifted auditory feedback with fMRI techniques 
to target explicitly the brain regions involved in auditory-vocal 
motor control in singing. As discussed earlier, pitch-altered feed- 
back elicits pitch-shift responses that often contain early and 
late components. Larson and colleagues suggested that the early 
pitch-shift response, which may be governed by the midbrain 
PAG, is a more automatic reaction used to stabilize vocal out- 
put by correcting small, unexpected fluctuations in vocal pitch; 
the late pitch-shift response, on the other hand, may be under 
more voluntary control — perhaps controlled by the auditory cor- 
tex, ACC, etc., — and thus may contribute to vocal pitch control 
during speaking and singing (Burnett et al., 1998; Larson, 1998; 
Hain et al, 2000; Liu and Larson, 2007). Indeed, although trained 
singers exhibit early pitch-shift responses to briefly pitch-shifted 
feedback, they were still able to maintain their intended goal 
for vocalization (either sustaining a steady pitch or glissandos, 
Burnett and Larson, 2002; Hafke, 2008), perhaps due to enhanced 
top-down control of the late pitch-shift response that resulted 
from years of vocal training. In contrast, non-musicians may 
not exhibit such precise vocal control over the late pitch-shift 
response. To assess the effects of extensive vocal training on pitch 
control in singing, Zarate and colleagues (2008, 2010b) tested 
singers and non-musicians with two singing tasks that required 
different types of top-down voluntary control: (1) an "ignore" 
task where subjects were required to hold their pitch steady, 
despite hearing pitch-shifted auditory feedback; and (2) a "com- 
pensate" task in which subjects had to voluntarily adjust their 
vocal pitch precisely to correct for the pitch shift. The authors 
hypothesized that ignoring a small pitch shift would not only 
elicit an early pitch-shift response, but also target the PAG rel- 
ative to the compensate task, which was specifically designed to 
engage their proposed cortical substrates for auditory-motor con- 
trol of vocal pitch — auditory cortex, insula, and ACC (Zarate and 
Zatorre, 2008; Zarate et al, 2010b). 

Due to the temporal limitations of fMRI methodology, Zarate 
et al. (2010b) were not able to determine whether the PAG is 
involved particularly with eliciting early pitch-shift responses, 
since these responses have a latency that is shorter than the 
best temporal resolution for fMRI. Nevertheless, two interest- 
ing cortical findings from their singing tasks were observed. 
First, both groups recruited the IPS and dorsal premotor cortex 
(dPMC) in each pitch-shifted singing task, compared to singing 
with normal feedback (Zarate and Zatorre, 2008). The authors 
suggested that since the IPS is associated with transformations 
of sensory input for motor preparation (Astafiev et al, 2003; 
Grefkes et al., 2004; Tanabe et al., 2005), it was recruited specif- 
ically during transformations of auditory input (see Foster and 
Zatorre, 2010; Zatorre et al, 2010; Foster et al., 2013) into spa- 
tial information within the frequency domain (i.e., up or down). 
This "frequency spatial information" can then be used by the 
dPMC — an area that receives indirect connections from audi- 
tory and parietal areas via the insula (Mufson and Mesulam, 



1982), and is attributed to conditional sensory-motor associa- 
tions (Petrides, 1986; Chouinard and Paus, 2006) — to prepare 
a vocal response (e.g., maintain steady vocal output or correct 
for the pitch shift). Second, despite the observed lack of per- 
formance differences in the compensate task — i.e., both groups 
voluntarily adjusted for the pitch-shifted feedback to a similar 
extent — different neural substrates for auditory-motor control 
were recruited in each group. Compared to singers, the non- 
musicians exhibited more activity within the dPMC while volun- 
tarily correcting for the pitch shift (Figure 2A; Zarate and Zatorre, 
2008); the authors proposed that the dPMC was recruited selec- 
tively in non-musicians as they learned to associate a pitch-shift 
"cue" in auditory feedback with a corrective adjustment in vocal 
pitch. Therefore, this region may constitute a basic substrate 
for voluntary auditory-motor control of vocal pitch (Zarate and 
Zatorre, 2008) and perhaps music production in general — after 
more training and practice, the dPMC is recruited less in non- 
musicians during the same musical production task that was 
learned (and assessed with fMRI) at earlier stages of an exper- 
iment (Chen et al., 2012). Indeed, rather than recruiting the 
dPMC, singers engaged auditory cortex within the pSTS, ante- 
rior insula, and ACC for this task (Figure 2B; Zarate and Zatorre, 
2008; Zarate et al., 2010b). Moreover, voluntary vocal-control 
singing tasks (i.e., compensating for and ignoring large pitch 
shifts in feedback) specifically enhanced the functional connec- 
tivity between the pSTS and IPS (Figure 2C; Zarate et al., 2010b). 
Given the IPS' role in sensory-motor transformations, Zarate 
and colleagues suggested that within singers, the auditory cor- 
tex and IPS jointly process and extract pitch-shift information 
that can be used to control vocal pitch (e.g., magnitude and 
direction of the pitch shift). Since the auditory cortex is func- 
tionally connected to the insula and ACC (Zarate and Zatorre, 
2008; Zarate et al., 2010b), the pitch-shift information may be 
sent via the anterior insula to the ACC for initiation of the task- 
appropriate vocal motor program (i.e., maintain the originally 
produced note or correct for the shift). The authors proposed that 
these four cortical regions constitute an experience-dependent 
network for auditory-motor control of the singing voice, which 
may be recruited increasingly as a function of more vocal training 
and practice. 

SHORT-TERM TRAINING EFFECTS ON AUDITORY AND VOCAL SKILLS 
AND THEIR NEURAL CORRELATES 

Based on the studies above, trained singers may have more pre- 
cise vocal control compared to non-musicians, due to extensive 
vocal training that recruits an experience-dependent cortical net- 
work and/or selectively gates access to sensory feedback within 
this network. However, Amir et al. (2003) determined that instru- 
mental musicians (without formal vocal training) also sang more 
accurately than non-musicians in a simple pitch-matching task, 
in which subjects were required to sing a note that was just 
presented. Additionally, two studies report a significant corre- 
lation between pitch discrimination and vocal accuracy in both 
instrumental musicians and non-musicians — individuals who 
sang more accurately also had better discrimination skills (Amir 
et al., 2003; Watts et al, 2005). If this observed correlational rela- 
tionship is a causal one, as these studies suggest, then refining 
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FIGURE 2 | Brain regions involved in auditory-motor control of 
singing, as observed in non-musicians and singers. (A) When 
voluntarily correcting for a 200-cent pitch shift in auditory feedback 
("compensate 200c" task), non-musicians recruited more activity within 
the dorsal premotor cortex (dPMC) than singers. (B) Singers engaged the 
posterior superior temporal sulcus (pSTS), anterior cingulate cortex (ACC), 
and anterior insula (aINS) when performing the "compensate 200c" task. 



pitch-discrimination skills may lead to better vocal accuracy. 
For instance, many studies have reported that auditory training 
improves pitch discrimination both at the training frequency and 
at other non-trained frequencies (Demany, 1985; Delhommeau 
et al., 2002, 2005; Ari-Even Roth et al., 2003). Furthermore, 
the effects of auditory training with pure tones also general- 
ize to more complex tones (Grimault et al., 2003). In light of 
these observations and the proposed causal relationship between 
pitch discrimination and vocal accuracy, the newly enhanced 
ability to discriminate between pitches (following training) may 
increase the likelihood of detecting slight errors in vocal out- 
put, which may result in increased vocal accuracy. In turn, these 
training-induced behavioral changes are often accompanied by 
neural plasticity. For example, after non-musicians had received 
pitch-discrimination training, improved pitch discrimination was 
accompanied by enhanced auditory cortical responses (Bosnyak 
et al., 2004). Additionally, when non-musicians were trained to 
associate specific piano keys with their corresponding pitches and 
play short piano melodies, significant training-induced increases 
in cortical activity were observed within auditory, sensorimo- 
tor, frontal, and parietal regions (Bangert and Altenmuller, 2003; 
Lahav et al., 2007). 

Therefore, to examine whether: (1) singing accuracy improves 
subsequent to auditory training, and (2) auditory-training 
enhanced singing specifically engaged the experience-dependent 
network for auditory-motor control in singing (i.e., auditory 
cortex, IPS, anterior insula, and ACC), Zarate et al. (2010a) 
tested two groups of non-musicians — an experimental group that 
received training to improve their auditory discrimination skills, 



(C) Analyses of task-modulated functional connectivity revealed that 
relative to singing with normal auditory feedback, the 200-cent pitch shift 
specifically enhanced functional connectivity between right pSTS and 
intraparietal sulcus (IPS) during both the "ignore 200c" and "compensate 
200c" tasks, as well as the postcentral gyrus (containing somatosensory 
cortex) during the "ignore 200c" task. Data from Zarate and colleagues 
(2008, 2010b). 



and a control group that received no training — with auditory 
discrimination and singing tasks. In this study, the investiga- 
tors employed more naturalistic melodic singing tasks to target 
the experience-dependent network, since accurate production 
of novel melodies requires auditory-motor control in a simi- 
lar fashion as voluntarily correcting for pitch-shifted feedback; 
the auditory feedback of the currently produced note may be 
monitored in order to produce the correct pitch interval to the 
next note. Although the experimental group displayed enhanced 
auditory discrimination skills and training-induced changes in 
auditory task-associated neural activity (Zatorre et al., 2012), they 
did not show significant improvements in singing performance 
or recruit the experience-dependent network for auditory-motor 
control in singing (Zarate et al, 2010a). Consequently, Zarate 
et al. (2010a) concluded that auditory training alone (at least 
in an experimental setting) is not sufficient to improve vocal 
performance or recruit the experience-dependent network for 
auditory-motor control of singing (auditory cortex, IPS, ante- 
rior insula, and ACC); perhaps only simultaneous enhancements 
in both auditory and vocal motor skills via extensive training 
(e.g., voice lessons) would bring forth improvements in vocal 
performance and engage this particular network. 

SENSORY-MOTOR CONTROL OF SINGING IN OTHER 

POPULATIONS 

ACQUIRED VOCAL AMUSIA 

Clinical evidence that complements the proposed roles of the 
auditory cortex, IPS, SI, insula, and premotor regions during 
singing comes from case reports of brain lesions that result 
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in vocal amusia or oral-expressive amusia (for a review, see 
Berkowska and Dalla Bella, 2009; Stewart et al., 2009). For 
instance, a woman with cortical atrophy in the right temporal 
lobe and insula, as well as diminished blood flow to right frontal 
and temporal regions, exhibited signs of progressive amusia and 
aprosodia — she gradually was incapable of perceiving and pro- 
ducing well-known melodies and affective intonation or prosody 
in speech (Confavreux et al., 1992). Additionally, a female tango 
singer who suffered a right-lateralized cerebral infarction pre- 
sented with damage to right Heschl's gyrus and STG, inferior 
parietal regions including supramarginal gyrus and SI, and pos- 
terior insula; her music perception was greatly diminished post- 
stroke (relative to speech discrimination), and her singing was 
considered less stable within single notes, less accurate in pitch, 
and monotonous in affect (Terao et al., 2006). 

While the two previous cases with damage to auditory cortex, 
insula, and other regions within the singing network presented 
with deficits in both music perception and production, two addi- 
tional cases present perhaps the strongest evidence for these 
regions' involvement specifically for singing in the absence of 
impaired auditory perception. In a female patient who suffered 
a stroke in the right hemisphere affecting the lateral frontal lobe 
and Ml, STG, insula, SI, and inferior parietal lobe, investiga- 
tors observed impaired affective intonation in speech and the 
inability to sing pitch intervals accurately, while familiar-song 
perception and singing rhythms or melodic contour were rela- 
tively preserved (Murayama et al, 2004). Finally, a male amateur 
singer with right-lateralized damage to his posterior temporal 
lobe, inferior parietal lobe, insula, and inferior frontal gyrus pre- 
sented with relatively spared speech comprehension and produc- 
tion, prosodic perception and production, music perception, and 
rhythm production; however, he exhibited specifically impaired 
pitch-interval production (Schon et al., 2004). This rather pure 
case of vocal amusia — in the absence of aphasia, aprosodia, 
and "perceptual" amusia — demonstrates that the damaged brain 
regions, which overlap with the areas outlined by Zarate and col- 
leagues (2008, 2010b), contribute to the finely-grained sensory- 
motor control of singing. 

CONGENITAL AMUSIA 

Recall that the same neural network is recruited for singing in 
healthy individuals, irrespective of the amount of vocal train- 
ing or experience (see section Neuroimaging Evidence: A General 
Functional Network For Human Vocalization). However, when 
pitch processing is compromised as observed in congenital amu- 
sia (Ayotte et al, 2002; Peretz and Hyde, 2003; Foxton et al., 
2004) — due to cortical malformations in the STG and inferior 
frontal gyrus (Hyde et al., 2007) and disrupted structural and 
functional connectivity (Loui et al., 2009; Hyde et al, 2011) — it 
may be assumed that pitch production in singing would similarly 
be affected as well. Yet, as observed in Murayama's et al. (2004) 
and Schon's et al. (2004) case reports, a dissociation between pitch 
perception and production skills can exist — following a stroke, 
spared pitch perception does not necessarily preclude inaccurate 
pitch production. Conversely, some individuals with congenital 
amusia still can sing pitch changes in the correct direction (e.g., 
up vs. down), match target notes, and sing familiar song excerpts 



somewhat accurately, despite observed problems with pitch per- 
ception (Ayotte et al, 2002; Loui et al, 2008; Dalla Bella et al, 
2009; Hutchins et al, 2010). 

Based on this behavioral evidence, as well as observations of 
singing in the general population, Berkowska and Dalla Bella 
proffered a "vocal sensorimotor loop" model to outline two func- 
tional pathways within the song system that may explain observa- 
tions of accurate-pitch and poor-pitch singing (Berkowska and 
Dalla Bella, 2009; Dalla Bella et al, 2011). In this model, the 
authors list potential brain regions — based on previous neu- 
roimaging studies, many of which are included in the section 
Neuroimaging Evidence: A General Functional Network For 
Human Vocalization — that contribute to mechanims underlying 
singing, such as: regions within the STG for processing auditory 
input, which includes the auditory target to be reproduced and 
auditory feedback; dorsal prefrontal cortex, inferior sensorimo- 
tor cortex, area "Spt" within the planum temporale, and insula 
for auditory-motor mapping and memory access; supplementary 
motor area, ACC, and insula for motor preparation; and ven- 
tral Ml for vocal motor execution. Berkowska and colleagues 
also make distinctions between two pathways — a covert path- 
way involved in pitch discrimination (that can be compromised 
in congenital amusia), and an overt pathway involved in pitch 
production — but they do not clarify which of the aforementioned 
brain regions belong to each pathway. Congenital amusia may 
be due to a structural and functional "disconnection" between 
right auditory and inferior frontal cortical regions that contribute 
to pitch processing — although the right auditory cortex exhibits 
differential responses to pitch changes, the right inferior frontal 
cortex does not show a correlated increase in activity, as it does 
in normal listeners (Hyde et al, 2011). Even though this partic- 
ular covert pathway is affected, auditory input (e.g., presented 
auditory targets, auditory feedback, etc.) can still be processed by 
auditory cortex (Moreau et al., 2009; Peretz et al., 2009; Moreau 
et al, 2013). Hypothetically speaking, auditory input may then 
be processed further by IPS (depending on the amount of vocal 
training), anterior insula, and premotor regions (dPMC or ACC) 
for auditory-motor control of singing based on Zarate's find- 
ings (Zarate and Zatorre, 2008; Zarate et al., 2010b), rendering 
vocal production relatively spared in some instances of congenital 
amusia. 

COMPARISONS WITH MODELS OF AUDITORY PROCESSING 

Berkowska and Dalla Bella's (2009), Dalla Bella et al.'s (2011) 
vocal sensorimotor loop model for singing, when enriched with 
neuroimaging evidence from Zarate and Zatorre (2008), Hyde 
et al. (2011), and Loui et al. (2009), potentially consists of audi- 
tory and inferior frontal cortex in the covert perception pathway 
(Figure 3, blue arrow), and auditory cortex, IPS, anterior insula, 
and premotor areas in the overt production pathway (Figure 3, 
red arrows). These updated pathways resemble the more rec- 
ognized (and widely debated) dual-stream model for auditory 
processing, which was first proposed by Rauschecker and Tian 
(2000). The dorsal stream was originally suggested to be spe- 
cialized for processing auditory spatial information (the "where" 
pathway), while the ventral stream was attributed with pro- 
cessing auditory object/sound identity information (the "what" 
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pathway). The scientific debate focuses mostly on competing 
accounts and hypotheses of the dorsal stream's contributions, 
which include: (1) processing spectral changes over time (the 
"where in frequency" or "how" pathway, Belin and Zatorre, 
2000); (2) extracting relevant sound features and matching them 
with stored templates of motor responses (the "do" pathway, 
Warren et al., 2005); (3) transforming auditory representations 
of speech into motor programs for speech gestures (Hickok 
and Poeppel, 2000, 2004, 2007); and (4) comparing between 
feedforward and feedback mechanisms (Rauschecker and Scott, 
2009). 

For our purposes here, the most relevant dorsal-stream mod- 
els are the spectrotemporal processing account from Belin and 
Zatorre (2000) and auditory-motor transformation hypothe- 
ses for auditory spatial processing and speech from Warren 
et al. (2005) and Hickok and Poeppel (2000, 2004, 2007). It 
should be noted, however that the auditory-motor control net- 
work for singing conflicts with the latter two models, in which 
area Spt in the planum temporale is the sole neural substrate 
for auditory-motor transformations (Hickok and Poeppel, 2000, 
2004; Warren et al, 2005; Hickok and Poeppel, 2007). Zarate's 
singing research (2008, 2010b) provides empirical evidence both 
supporting, and perhaps, updating these dorsal-stream models — 
auditory cortex and IPS process and extract pitch changes 
from feedback, and the pitch information is sent from these 



UPDATED VOCAL SENSORIMOTOR LOOP MODEL 



OVERT PATHWAY: 
VOCAL PRODUCTION 




COVERT PATHWAY: 
AUDITORY PERCEPTION 



FIGURE 3 | A revised version of Berkowska and Dalla Bella's, Dalla 
Bella, and colleagues' (2009, 2011) vocal sensorimotor loop model for 
singing, updated with findings from Zarate and colleagues (2008, 
2010b) fMRI studies. The covert pathway for pitch production (blue arrow) 
includes auditory cortex and inferior frontal gyrus (IFG), while the overt 
pathway for vocal pitch production (red arrows) is comprised of auditory 
cortex (STG/STS), intraparietal sulcus (IPS), anterior insula (aINS), anterior 
cingulate cortex (ACC), and dorsal premotor cortex (dPMC). Brain regions 
that are not visible normally from this lateral brain view are indicated in 
boxes outlined with dashes. Box colors are retained from Figure 1: light 
orange for auditory processing, green for vocal motor control, purple for 
multimodal processing. 



regions via the insula to premotor areas for vocal motor adjust- 
ments. Therefore, according to these neuroimaging findings, 
transformations of task-relevant auditory features into subse- 
quent motor responses may not take place in only one brain 
region, as purported by the Warren et al. and Hickok/Poeppel 
models, but rather may be parceled among a network of dif- 
ferent areas within the dorsal auditory stream. Thus, it could 
be argued that many brain regions along the dorsal auditory 
stream are involved in processing "how" auditory features change 
over time before executing or "doing" a specific motor act in 
response to these auditory events, regardless of the particular 
modality — be it information related to auditory space, speech, or 
music. 

CONCLUSION 

In this review, findings from over 20 years of research have 
been reviewed to outline a general neural network for song and 
speech production (section Neuroimaging Evidence: A General 
Functional Network For Human Vocalization). Within this func- 
tional network, cortical substrates that are specific for the 
sensory-motor control of singing pitch and are sensitive to 
the amount of vocal training have been identified (Figure 4): 
the pSTS and IPS for auditory processing and transformation 
for motor output (light orange boxes), SI for somatosensory 



TRAINING-SENSITIVE 
SENSORY-MOTOR AREAS FOR SINGING 




FIGURE 4 | Neural substrates for sensory-motor control of singing that 
are sensitive to the amount of vocal training [based on findings from 
Kleber et al. (2010, 2013), Zarate and Zatorre (2008), Zarate et al. 
(2010b)]. Brain regions that are not visible normally from this lateral brain 
view are indicated in boxes outlined with dashes, and box colors are 
retained from Figures 1 and 3. Activity within primary somatosensory 
cortex (S1 ) increases as a function of the amount of weekly vocal practice, 
suggesting a greater reliance on somatosensory feedback with more 
training and experience. After extensive vocal training and practice, the 
anterior insula (aINS) can serve a gating function for somatosensory 
feedback. Features within auditory feedback are processed and extracted 
by auditory cortex (STG/STS) and the intraparietal sulcus (IPS), and 
task-relevant auditory information is sent via the aINS to the dorsal 
premotor cortex (dPMC) — in people with little to no formal vocal 
training — or to the anterior cingulate cortex (ACC) in experienced singers to 
voluntarily adjust vocal output according to the singing task demands. 
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processing (yellow box), anterior insula (in purple, both for 
auditory-motor integration and somatosensory feedback gating), 
and premotor regions for vocal motor preparation and response 
initiation (dPMC and ACC, in green). When the auditory- 
related findings are placed within a larger framework — a dual- 
pathway (i.e., perception vs. production), sensory-motor model 
for singing (Berkowska and Dalla Bella, 2009) — these music- 
specific findings can then be linked to broader research interests 
in auditory cognition, such as auditory spatial localization and 
speech perception/production, due to the auditory-motor control 



network's similarity to prevalent dual-stream models of auditory 
processing as a whole. 
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